Solar cycle as a distinct line of evidence constraining Earth’s transient climate response

Severity of warming predicted by climate models depends on their Transient Climate Response (TCR). Inter-model spread of TCR has persisted at ~ 100% of its mean for decades. Existing observational constraints of TCR are based on observed historical warming response to historical forcing and their uncertainty spread is just as wide, mainly due to forcing uncertainty, and especially that of aerosols. Contrary, no aerosols are involved in solar-cycle forcing, providing an independent, tighter, constraint. Here, we define a climate sensitivity metric: time-dependent response regressed against time-dependent forcing, allowing phenomena with dissimilar time variations, such as the solar cycle with 11-year cyclic forcing, to be used to constrain TCR, which has a linear time-dependent forcing. We find a theoretical linear relationship between the two. The latest coupled atmosphere-ocean climate models obey the same linear relationship statistically. The proposed observational constraint on TCR is about \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{1}}/{{3}}$$\end{document}1/3 as narrow as existing constraints. The central estimate, 2.2 oC, is at the midpoint of the spread of the latest generation of climate models, which are more sensitive than those of the previous generations.


S1. The Linear Discriminant Analysis (LDA) S1.1. The general formulism
The LDA was first introduced by Fischer in 1936 1 .Let (, ) be an  ×  geophysical dataset that contains measurements at  timesteps and  locations.(, ) is assumed to have centered in time, i.e., the temporal mean at each location is zero.Let (, ) be a group matrix for specifying  groups in time: If the i-th timestep belongs to the j-th group, then (, ) = 1; otherwise, (, ) = 0.   = diag  is a  ×  diagonal matrix containing the number of elements in the j-th group.( )  is a within-group averaging operator, so that  = ( )   is a  ×  matrix of  group means at all locations.The  ×  within-group covariance matrix is given by  = ( − ) ( − ).The  ×  between-group covariance matrix is given by  = ( ) () () .The  ×  total covariance matrix is given by  = ( ) ( − ) + ( − 1) .
The LDA aims to find a discriminant vector  that maximizes the ratio  = (  ) (  ).Given the condition  = 0,  and  are the eigenvalues and eigenvectors of the matrix equation    =   , where k denotes the k-th eigen component. =  is the  × 1 projected time series corresponding to the k-th eigencomponent. = (   ) (  ) is the k-th  × 1 discriminating spatial pattern, which is mathematically equivalent to the regression coefficients of  projected on  at each location.

S1.2. The numerical algorithm
Supplementary Section S1.1 discusses the general principle of the linear discriminant analysis and the physical meaning of  and  .Here, we present the numerical algorithm we use to obtain  and  .This numerically stable and efficient algorithm is based on the discussion in Riley 2 .

S1.2.1. Data reduction Let
√ =  be the singular value decomposition (SVD) of  divided by the degrees of freedom in time.A whitening matrix  is defined as the inverse of some "square root" of the total covariance matrix  .Given the SVD of  √ , we may take as  =    , where  and  are the reduced  and the reduced  containing only the first r singular components.r will be a truncation parameter that regularizes ; see Supplementary Section S1.3.The left inverse of  is given by  =   .The whitened dataset  is given by , which is equivalent to √ − 1 , where  is the reduced .Thus,  is reduced to a  ×  matrix.
In SVD, the truncation parameter r is usually taken to be the rank of the data matrix, i.e., singular components greater than the rank are constant components in the null space.However, such a rank does not necessarily regularize  with respect to the process of interest (the 11-year solar cycle signal in our case).Before the LDA analysis of the global warming signal, Schneider and Held 3 proposed a linear regression approach  =  ( being the regression coefficients) to pre-determine the truncation parameter using the generalized cross-validation.Their approach works if the physical phenomenon is the dominant variability in the data.However, for our 11year solar cycle study, solar modulation is generally weaker than other natural variability.There, the Schneider-Held regularization likely returns  = 1, meaning that only the arithmetic mean of the data is retained and that the solar-cycle signal is removed from the regularization.Instead of the generalized cross-validation, we post-determine r based on the values of the extracted solarcycle response κ, the correlation coefficient of the projected time series ( in Supplementary Section S1.2.3) with the solar index, and the separation parameter  .We will discuss the determination of r in Supplementary Section S1.3.

S1.2.2. The discriminants for 𝒀
Since the total covariance of the whitened dataset is unity by construction, the discriminant is simply the eigenvector of the between-group covariance matrix  =   , where  = ( )   .One may perform an eigen-decomposition on  directly but the eigendecomposition is not numerically efficient.Instead, a more popular way is to perform an SVD on a square root of  .An obvious square root is proportional to  but  has the same dimension  ×  as  , which may be large if n is large.Instead, a square root of   is introduced.
In particular, the  ×  diagonal matrix  = diag   ⁄ is a square root of   because    = .With ,   may be rewritten as    .Thus,  is a square root of  with a smaller size  × , enabling a more efficient SVD processing.Let  =  ′′ be the SVD.Then ( − 1) =  ′′  ′ , suggesting that  contains the eigenvectors of  , which are the desired discriminants for  .
[Incidentally, since ( − ) =   − ( − 1) =  −     = ′( − ′  )′ ,  is also an eigenvector of  .]S1.2.3.The discriminants for  Finally,  needs to be transformed back to the original dataset.Since  = ( )  and  =        ,   is proportional to       , which can be rewritten in the diagonalized form ( ) ( ) .Thus,  are the eigenvectors of   and hence the desired discriminants.The discriminating spatial patterns are given by ( )   (  ).Since ( )   is an identity matrix, the discriminating spatial patterns are given by the columns of   , which may be simplified as    .The projected time series are given by the columns of  .

S1.2.4. The extracted solar response κ
To extract the solar-cycle response in the surface temperature, we use the total solar irradiance, (), as the training index to define the values in (, ).If () is greater than its mean over the period 1950-2004, then (, 1) is set to 1 and (, 2) is set to 0. In contrast, if () is less than its mean over the period 1950-2004, then (, 1) is set to 0 and (, 2) is set to 1.The first discriminant is the solar-cycle response in the surface temperature.Therefore, the first row of   is the solar-cycle spatial pattern, and the first column of  is the projected time series of the solar-cycle response.
The extracted solar-cycle response, κ, in the unit of °C/W m -2 is defined as the linear regression coefficient between the projected time series and ().

S1.3. Choice of r
The truncation parameter r discussed in Supplementary Section S1.2.1 has not been determined so far.r can be any integer between 1 and the smaller value of n and p, i.e. min( ,  ).We perform the LDA analysis for all possible values of r.For each r, a different  , and hence a different projected time series, is obtained.Then we calculate three quantities: the extracted solar-cycle response κ, the correlation coefficient ρ with (), and the separation parameter γ of the first discriminant.Supplementary Figure S1 shows an example using HadCRUT5.The maximum possible value of r is 58.As r increases from 2 to 58, ρ and γ generally increase slowly except at Supplementary Figure S1.An example of the extracted solar response κ (in the unit of °C/W m -2 ; red), the correlation coefficient ρ (blue) with () , and the separation parameter γ (yellow) as a function of the truncation parameter of the SVD pre-processing.
The surface air temperature used in this example is HadCRUT5.a few values of r. κ reaches a plateau of 0.0837°C/W m -2 when  is 30.Meanwhile, ρ also reaches a plateau of 0.83.In contrast, γ keeps increasing after  is 37 and it increases mostly abruptly when  is greater than 50 (increasing from 20 when  =52 to 90 when  = 57).Such an abrupt increase in γ near the maximum possible value of r is due to the null space and therefore should be avoided.
We performed the above analysis for the GISTEMP, NOAA, and ERA observational data and the CMIP6 model historical simulations.We found that  = 30 is generally applicable to all datasets.Thus, the results presented in the main text are derived using  = 30.
We assess the maximum uncertainty due to the choice of  by comparing with another approach adopted by Camp and Tung 4 , who determined r by finding the maximum change in the rate of change of γ with respect to r.Their method would have chosen  = 38 in the case of HadCRUT5, based on the yellow curve in Supplementary Figure S1, whence  = 0.0827°C/W m -2 , which is 1.19% lower than that at  = 30.We repeat the calculation of  from the 200 ensemble members of HadCRUT5 using  = 30 and 38.The resulting distribution of  has a median value 0.7% lower than those obtained using  = 30 .Thus, we conclude that the uncertainty due to the choice of r is of order 1%.

Supplementary Section S2. Initialization configurations of CMIP6 models
In the main text, we selected a subset of CMIP6 models with sufficiently long spin-up and pre-industrial control simulations to ensure that the surface temperature in the models have equilibrated before the solar responses are simulated.Supplementary Table S1 lists the details of the initialization configurations of the CMIP6 models.The design of the CMIP6 experiments is described in Pascoe et al. 5 In one of our previous works using the GISS model 6 , we showed that the ocean may take a few thousand years to reach the equilibrium state.However, only a few CMIP6 models have such long spin-up and pre-industrial control simulations.In the main text, we selected a subset of models whose (1) the spin-up and pre-industrial control times are well documented and (2) the combined spinup and pre-industrial control simulations are at least 400 years before branching off to the historical run.S1: The numbers of historical runs available (#hruns), the equilibrium climate sensitivities (ECS) 7 , the transient climate responses (TCR) 7 , radiative forcing of CO 2 doubling 8 , the solar responses (κ), duration of the spin-ups, duration of the pre-industrial control runs (piCtrl), and the branch-off times of the spin-ups from which the pre-industrial control runs start.The details of this information can be found from the references cited.Highlighted in orange are the models used in Figures 4 and 5 of the main text.Highlighted in pink are models not used in this study because of inadequate spin-up or preindustrial control of their oceans.